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ABSTRACT 

In clinical trials, adaptive allocation means that 
the thera^ les assigned to the next patient or patients depend on the 
results obtained thus far in the trial. Although many adaptive 
allocation procedures have iDeen proposed for clinical trials, few 
have actually used adaptive assignment, largely because classical 
frequentist measures of inference are difficult ^r impossible to 
calculate when the allocation is adaptive. The general problem of 
making inferences in classical trials, whether randomizea, adaptive, 
or open, is discussed; and Bayesian inference is described as being 
well-suited to the scientific method. Bayesian analyses of adaptive 
and other studies are illustrated with examples drawn from the 
following studies: (1) a study by R. H. Bartlett and others (1985) 
involving 12 patients; (2) a 39-patient study by J. H. Ware (1989); 
and (3) a study by D. 0. Dixon and others (1989) involving 16 
patients. These studies illustra'ce that Bayesian inference may be 
possible in clinical trials, but adjusting for variance is essential 
Three data tables and two graphs are included. A 27-item list of 
references is provided. (SLD) 
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ABSTRACT 

Many adaptive allocation procedures have been proposed for clinical trials. 
Few trials have used adaptive assignment. A principal reason is the inability 
to use classical statistical inferences with adaptive procedures. The general 
problem of making inferences in clinical trials, whether randomized, adaptive, 
or open, is discussed. Bayesian inference is described and illustrated in three 
actual trials, each with a different design. 

1. Introduction 

The focus of adaptive allocation methodology is on design. A principal 
reason such methodology is so infrequently used in actual clinical trials is the 
difficulty in making classical frequentlst inferences when using an adaptive 
design. The focus of this paper is on inference. So I will address designs 
other than adaptive; these include open studies and other studies that do not 
have a particular design. 

The design of an experiment is the set of actions taken by the investigator 
during the course of the experiment. The design is adaptive if these actions 
can depend on results that are observed while the experiment is in progress. It 
is nonadaptive if they cannot (that is, if actions are constant functions of 
results). Few clinical trials have adaptive designs. 

In a tjrplcal randomized clinical trial (RCT), half the patients are randomly 
assigned to an experimental therapy and the other half serve as controls. The 
number of patients in the trial is part of the design. P-values are calculated 
by adding the probabilities of results more extreme than those observed, 
assuming no treatment difference. This calculation requires that the planned 
design was actually followed, otherwise what is "extreme" changes and so does 
the P-value, perhaps in an unknown way. In general, deviations from the design 
invalidate classical statistical inferences. 

Practically every clinical trial deviates from its design in one way or 
another- -the most common deviation is probably a different number of patients 
from that planned. It seems ludicrous not to be able to draw conclusions from 
data honestly collected. So in calculating P-values, for example, we pretend 
that the resulting design was the one planned! I see nothing really wrong with 
this practice. The problem is that many statisticians fail to see that 
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essentially every P-value that's ever been calculated is necessarily flawed as a 
measure of inference. A consequence is that when it comes to other closely 
related but better understood and more openly discussed practices, they take P- 
values too seriously. 

These latter practices are controversial and include adjusting for multiple 
comparisons, multiple tests, and interim analyses (O'Brien 1983, Berry 1985, 
1987, 1988a, 1988b). The analogs of calculations based on the resulting design 
are P-values that ignore multiplicities; these are called nominal P-values. 

Interim analyses are especially appropriate in any discussion of adaptive 
methods. Accumulating data are analyzed periodically with the possibility of 
early stopping. But interim analyses must be planned in advance so "more 
extreme" results can be specified, and the probability of such data calculated 
under the various hypotheses. If they are not planned tKeii literally correct P- 
values and literally correct confidence intervals cannot be calculated, even if 
early stopping did not occur (Dupont 1983). Nominal P-values can of course be 
calculated. These serve as perfectly fine descriptive statistics. But, as 
Brown (1983) and Canner (1983) make clear, nominal P-values are irrelevant as 
measures of inference. We've already seen that essentially every P-value ever 
calculated is similarly flawed, though perhaps not as openly or obviously. It 
is splitting hairs to object in some instances and not in others. 

The subject of this session is adaptive allocation . Adaptive allocation 
means that the therapy assigned to the next patient, or therapies assigned wO 
the next group of patients, depend on results obtained thus far in the trial. 
Most published adaptive allocation procedures tend to assign therapies that have 
been performing better (for many examples see the Bibliography of Berry and 
Fristedt 1985). 

Current biostatistical practice dictates that analyses of clinical trial 
data are tied as closely as possible to the trial's design, As I indicated 
earlier, classical frequentist measures of inference are difficult or impossible 
to calculate when a trial's design is adaptive- -with the accent on "impossible" 
when allocation is adaptive. This is one of many reasons adaptive allocation is 
so infrequently used in actual trials. Some of the other reasons gi^en by Simon 
(1977), Armitage (1985), and Peto (1985), among others, are quite valid. These 
latter reasons substantially limit the practical usefulness of adaptive 
allocation methods. But the fact that classical inference is impossible in a 
legitimate scientific enterprise means to me that we should abandon classical 
inference rather than abandoning the enterprise! 

I will expand on this statement in the next sec*:ion, showing that classical 
inference is counter to the scientific method. In Section 3 I will describe how 
Bayesian inference applies to adaptive designs. And in Section 4 I will give 
some examples of Bayesian analysis. 

2. The Scientific Method and Adaptation 

The process of scientific research is given in the following six steps: 

1. Ask a question or pose a problem. 

2. Assemble and evaluate the relevant information. 

3. Based on current information, design an investigation or an experiment 
(perhaps the null experiment) to address the question posed in step 1. 
Consider costs and benefits --including information content--of the available 
experiments. Recognize that step 6 is coming. 

4. Carry out the investigation or experiment. 
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5. Use the evidence from step 4 to update the previously available Information; 
draw conclusions, if only tentative ones. 

6. Repeat steps 3 through 5 as necessary. 

Questions addressed in clJnical research usually deal with the effectiveness 
of therapies. The set of available experiments includes clinical trials. Costs 
are in terms of time and resources, just as in other scientific research. But, 
and this sets clinical research apart from other scientific research, costs also 
include ineffective medical therapy- -for patients in and out of the trial. 

In this scientific process, learning takes place as the experimental results 
accrue. Suppose an experiment can be decoc^posed into two separate experiments 
with no additional costs. After the first of these is carried out the available 
information is updated (suppose at no or negligible cost) . Based on this rew 
information the second half of the original experiment may now be unnecessary, 
or perhaps a radically different next experiment is appropriate. Continue in 
this way to partition a contemplated "experiment" into its smallest possible 
pieces, with information updated continuously. There is a net benefit provided 
by the possibility of deviating from the original plan. (To see this notice 
that one option that's always available is to stick with the original plan.) 
This assumes that updating is costless. In clinical trials this assumption is 
at best approximately true. But the assumption may be reasonable in those 
clinical trials where the cost of ineffective treatment far outweighs other 
costs. It also assumes that there information that accrues during the trial; 
in some trials the responses are not observed until the trial is over (though in 
survival studies at least partial information becomes available at each analysis 
epoch) . 

The scientific process described here is the motivation behind the 
recommendations to use adapti^^e allocation procedures. In standard approaches 
to RCTs investigators are supposed to close their eyes to accumulating data, 
and that seems unscientific. Adaptive procedures seem more scientific. But 
most adaptive allocation procedures are as arbitrary and as unscientific as 
RCTs. For example, consider the play- the- winner rule: the same therapy is used 
after a success and therapy is switched after a failure. The investigator has 
eyes open, but is made to wear glasses that induce extreme myopia. Throwing out 
all previous knowledge and remembering only the last thing learned is hardly 
what I mean by updating. 

What kind of adaptive procedures are scientific? In deciding which 
experiment to carry out the investigator should consider costs anU benefits 
explicitly. For the sake of discussion let's restrict cost considerations to 
effective therapy. The question is, effective for whom? The answer gives rise 
to the "patient horizon", N, introduced by Anscombe (1963). The patient horizon 
is the number of patients (in the trial and not) who are in the population being 
treated and who will eventually be treated with one of the competing therapies. 
Anscombe describes the solution by dynamic programming. This is consistent with 
the scientific method. (He describes it in context of adaptive stopping, but 
the method applies as well to adaptive allocation.) The current experiment is 
designed knowing rhat later experiments are possible (cf. step 6). The value of 
information to be gained in an experiment- -information that will help treat 
later patients- -is weighed against the possibility of ineffective treatment of 
patients involved in the experiment. 

The patient horizon is never perfectly known. And it clearly depend:> on the 
safety and effectiveness profiles of the competing therapies, which are also 
unknown. If oiie of the therapies turns out to be very effective then, while 
still unknown, N will be larger than otherwise. This makes allocations to the 
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apparently inferior therapy more worthwhile. This effect is ignored by all the 
adaptive methods 1 have seen proposed. 

It is not my objective to recommend particular adaptive allocation 
procedures, nor to ^utline a possible role for such procedures in clinical 
trials. But I venture the opinion that adaptive allocations will never be 
widely used in clinical trials, and that this is appropriate. When ethical 
considerations are not primary and N is large, the RCT is quite a satisfactory 
design (Berry and Eick 1989), but good RCTs are much smaller than those 
typically carried out. And in those settings where ethical considerations are 
of prime importance, which can be accommodated by taking N - 1, well -documented 
open studies are best, and I believe they will be used increasingly (Berry 
1989b). Open studies are scientific, but they are at least as problematical for 
classical inference as are adaptive studies. Biyesian inference may be possible 
in open studies, depending on the degree of documentation, particularly as 
regards reasons for treatment assignment. 

I want to make one additional point about design and the scientific method. 
Partitioning an experiment as described earlier means that it is better to 
rethink the experimental process as frequently as possible. (I'm assuming that 
thinking is costless- -which it's not- -and I'm assuming that the thinker is not 
constrained by an unscientific process of inference.) In particular, large 
trials that don't allow adaptation are bad, and small trials are good. 
Stringing together small trials is flexible. The design of the next trial can 
be based on the results from previous studies, or the experimental plan can be 
abandoned. Using small studies is globally adaptive. Small studies are frowned 
upon by classicists (Peto et al. 1976). Making inferences requires analyzing 
data from various studies, each with its own peculiar characteristics: 
metaanalysis. The Jiayesian approach is ideally suited to this endeavor 
(DuMouchel 1989). (However, publication bias and other similar biases can make 
correct inferences difficult or impossible in any approach: if I hide the 
smallest numbers in a variable sequence from you, and you think you've got the 
whole sequence, you're not going to do well in guessing how the sequence was 
generated! Of course, a Bayesian who understands that there may be publication 
bias will tend to do better than one who does not.) 

3. Flexibility of Bavesian Inference 

I indicated in the introduction that the problem of multiplicities makes 
classical frequentist inference unsuited for adaptive designs; this statement 
applies for other scientifically valid designs as well. On the other hand, the 
scientific process outlined in the previous section is ideally suited for 
Bayesian inference. For example, updating one's state of knowladge is a 
Bayesian notion. Also, step 3 requires evaluating the information content of 
possible experiments. Information content usually depends on the results of an 
experiment. Predictive probability distributions of observable results are 
anathema to classical inference, but they are easily and naturally formulated 
using Bayesian methods. I don't want to rule out the possibility that there are 
other approaches that are consistent with the scientific method described in the 
previous section, but classical frequentist methods are not. 

In the Bayesian approach the design used is irrelevant once the data are at 
hand (Berger 1985; Berger and Berry 1988a, 1988b; Berry 1987, 1988b). Here I 
mean "data" in the broadest possible sense; in particular, in an open study the 
data includes all information about the patients available to the clinician who 
assigned therapy. (The only problem I see with this is the impossibility of 
quantifying some types of such information. For example, the clinician might 
sense characteristics of a patient that are difficult to communicate and use as 
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covariates. A possible solution Is to have the clinician who assigns therapy be 
different from the one who diagnoses.) 

Consider the following trial. There are two possible therapies, A and B, 
and two responses, success (S) and failure (F). Patients In the trial are 
assumed to be exchangeable Insofar as their anticipated response Is concerned. 
The following are the results: 

Therapy AAABBBBBBAAA 
Response SSFFSFFSFSSS 

In the Bayeslan approach the only Information needed to analyze these 
results are the sufficient statistics: 5 of 6 successes on A and 2 of 6 
successes on B. (The assumption of exchangeability Is critical here.) In 
particular, the design Is Irrelevant. Many different designs could have 
produced these data, here are a few: 

(I) An RCT planned for 12 patients assigned randomly In blocks of six, three 
on each therapy. 

(II) Randomized play- the -winner assignment (see Example 1 below for a 
description) where sampling stops as soon a.- the absolute difference In 
sample success proportions Is at least 1/2. 

(III) An open study In which the clinician plans to use 3 A's, 6 B's, 6 A's, 6 
B's, etc., until concluding that further use of either therapy would be 
unethical, or until becoming tired. 

(iv) An open study in which the clinician assigns theiapy in an arbitrary 
fashion, with some lance in mind, and the data given re interim 
results. 

The only reservation I have about the design affecting my conclusions is that 
there might be hidden data that would violate the assumption of exchangeability. 
For example, in design (iv) I would worry that the clinician might have assigned 
therapy to patients based on covariates to which I am not privy (not that it's 
wrong to do this, it's Just that I want to know about it); had this happened 
then I could not draw conclusions from the data unless I were told what the 
covariates were (and perhaps not even then!). Similarly, in design (ill) I 
would worry that the clinician had juggled the order of admission to, in effect, 
assign the sicker patients to one of the therapies. 

We d2 need to know the design to calculate P-values (and confidence 
intervals). For design (1) I get IP - 0.12 (exact test). For (11), the 
probability that A wins if there is no difference in therapies is IP - 1/2. P- 
values cannot be calculated for designs (ill) and (iv). 

In this section I wil? i.llustrate Bayeslan analyses of adaptive and othei 
studies in the context of examples. The examples I give are clinical trials 
with dichotomous responses; the ideas generalize easily to other types of 
trials. 

Example 1 (Bartlett e t al. 1985) 

This is one of the few clinical trials in whirh adaptive allocation has been 
used. The analyses I present here are far from the final word. More complete 
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analysis Is forthcoming In Berry and Hardwlck (1989). A randomized play-the- 
winner scheme was carried out as follows. An A and a B were placed In an urn 
(figuratively speaking). One was selected randomly and the corresponding 
therapy administered, A - experimental (ECHO) and B - control (conventional 
therapy). If the response was survival (S) then the treatment letter was 
replaced In the urn and another letter of the same type was added- -response time 
was effectively Immediate. If the response was death (F) then the treatment 
letter was replaced and a letter of the other type added. Stopping was to have 
taken place when ten balls of either type had been added to the urn. The second 
phase of the study was to be nonrandomized, with all patients assigned to the 
therapy that performed better in the first place. 

The responses reported by Bartlett et al. were as follows: 

Therapy ABAAAAAAAAAA 
Response SFSSSSSSSSSS 

(Note the deviation from the stopping rule.) After the trial, 8 more patients 
were administered A and all survived, and 2 more were administered B and both 
died. 

Suppose the patient population Is homogeneous, so the patients are regarded 
to be exchangeable. Let p^ and p^ be the probabilities of success on treatments 

A and B. (In the next example I will describe a model in which these 
probabilities depend on the patients' prognoses.) 

Taking the classical frequentist point of view. Ware and Epstein (1985) 
observe that the Bartlett et al. trial had a "50% false positive rate", or type 
I error rate: if the null hypothesis p^ - p^^ is true, then the probability of 

obtaining 10 more A's than B's is 1/2. They say this rate is "unacceptably 
high". Tills is an instance of what I mean by taking hypothesis testing too 
seriously; in particular, it applies no matter how strongly the actual data 
favors either therapy. Ware and Epstein conclude: "Further randomized clinical 
trials using concurrent controlt;. . .will be difficult but remain necessary." 
(Hetice the study described in Example 2.) 

A Bayesian app^o-ch requires a prior distribution on (p. ,p«). For 

illustrative purposes only, suppose this is uniform. Such an assumption is 
consistent with assuming the treatments to be exchangeable and independent a 
priori, with little information available about either. (None of these 
assumptions is correct- -see below.) The posterior density (on the unit square) 
given data from the trial is then 

Consider the conditional relative improvement due to ECMO (compareu with 
conventional therapy): P^^-Pg- Define the (unconditional) relative improvement 

to be the probability that this is greater than c: 
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/g'^f (x,y)dx 



- 90 

91 




This is labeled with an asterisk in Figure 1; in particular, the posterior 
probability that ECHO is better than conventional therapy is RI(0) - 90/91. 

Consider also the 10 patients reported by Bartlett et al. that were treated 
after the trial. According to the protocol, all 10 should have been assigned to 
ECHO. My understanding is that all met the eligibility criteria for the study 
but the ECHO device was not available for the two who were assigned to 
conventional therapy. Considering these 10 to be exchangeable with the patients 
in the trial means that 



The relative improvement function, RI(c), for this density is labeled with a 
double asterisk in Figure 1; now RI(0) - 0.9999. 

Bartlett et al. claim that the patients in the trial would have had at least 
an 80% death rate on conventional therapy . A Bayes ian analysis can incorporate 
historical contr( .s (Berry and Hardwick 1989) --indeed, the scientific method 
requires using all available information. But in an ostensibly scientific 
report, any such statement should be backed up by evidence. In thi5 instance 
the issue is critical. If P« known to be 0.2, say, then 



this is l-(.2-l-c) for 11 successes out of 11 patients on A, and l-(.2+c) for 
19 successes out of 19 on A. These are shown in Figure 2, using the same 
labeling system as in Figure 1. The relative improvement of A is dramatic under 
this assumption. For example, in the second case, RI(0.5) > 0.999, so ECHO is 
very likely to save an additional 50% of the patients as compared with 
conventional therapy. 

The patients in this trial were not actually exchangeable. (Not 
incidentally, the patient who received conventional therapy in the randomized 
phase happened to be the sickest of the 12.) Berry and Hardwick (1989) carry 
out a Bayeslan analysis accounting for the patients' characteristics, as well as 
incorporating historical controls. 



Ri(c) - Sl^,f(v^)<iP^; 
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Figure 2. Relative improvement for A over B assuming 
p =0.2 and uniform prior for p . 



Example 2 (Ware 1989) 

Experimental (A) and control (B) were the same as in Example 1. The triel 
was In two phases. Phase 1 was balanced randomized and would stop when either 
therapy accumulated 4 deaths. The oLher therapy would be used exclusively in 
phase 2, which would end when this other therapy accumulated a total of 4 
deaths. (In view of the Bartlett study (Example 1) and other available 
information on ECHO and conventional therapy, I think this trial- -or any trial 
randomizing to conventional therapy--was unethical; cf. Berry 1989b.) 

The results are shown in Table 1. Note that phase 2 stopped with only one 
ECHO death. See Ware (1989) for the way he gets around this obvious stumbling 
block for classical inference. 
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Table 1. Data from Ware (1989)* 



Patient 
number 


Treatment 


Initial 
prognosis 


Response 


P(P^>Pg|data) 




1 


A 


0 


754 


S 


0 


59 




O 
L 


B 


U 


oyo 


S 


0 


48 




J 


A 


A 

U 


QQQ 

oyy 


S 


0 


51 




A 
H 




A 

u 


7/. 7 


S 


0 


46 




«/ 


B 


u 


/ CKj 


F 


A 

u 


7T 
71 




g 

w 


A 




oo^ 


S 


A 

u 


7/. 




7 


A 


0 


886 


S 


0 


77 




8 


B 


0 


842 


s 


0 


74 


lASE 


9 


B 


0 




s 


n 
u 


7*^ 


1 


10 


B 


0 


844 


F 


n 
u 


fl7 




11 


A 


0 


874 


S 


n 

u 


oo 




12 


A 


0 


877 


s 




Q;1 
y\j 




13 


D 

o 


0 


.788 


F 




yjyj 




14 


A 

A 


0 


.902 


S 


0 


. yjQ 




15 


A 

A 


0 


.922 


s 


0 






16 


D 

o 


0 


.826 


s 


0 


. y*ry 




17 


D 

o 


0 


874 


s 


0 


■ y*T\j 




18 


A 

A 


0 


.871 


s 


0 


. y*rO 




19 


B 

o 


0 


.838 


F 


0 


, y / 




20 


A 

A 


0 


900 


S 


0 


» y f o 




21 


A 

A 


0 


716 


F 




Q1 R 

. 7 XO 




22 


A 

A 


0 


960 


S 


u 


00 0 




23 


A 

A 


0 




S 


u 


7 




24 


A 


0 


o & u 


S 


A 

u 


yHJ 




25 


A 


0 




s 


A 

u 


QRA 

7DH 




26 


A 


0 


O 


s 


A 

u 


70U 




27 


A 


0 


874 


s 


0 


965 




Oft 
CO 


A 


0 


774 


s 


0 


971 


ASE 


29 


A 


0 


941 


s 


0 


973 


2 


30 


A 


0 


615 


s 


0 


981 




31 


A 


0. 


825 


s 


0 


984 




32 


A 


0. 


865 


s 


0. 


985 




33 


A 


0. 


775 


s 


0. 


988 




34 


A 


0. 


832 


s 


0. 


989 




35 


A 


0. 


792 


s 


0. 


990 




36 


A 


0. 


874 


s 


0. 


991 




37 


A 


0. 


770 


s 


0. 


992 




38 


A 


0. 


735 


s 


0. 


994 




39 


A 


0. 


921 


s 


0. 


994 



★ 

The order of patient reponses and covariates used to calculate prognoses are 
not given in Ware (1989); Professor Ware was kind enough to provide these to me. 

•kit 

Predicted probability of success on treatment A from Tooiaasian et al. (1988). 
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Table 1 also shows the Individual patients' prognoses. The method of 
computing these was taken from Toomaslan et al. (1988) who report on a national 
registry of 715 ECHO cases and calculate a loglt for survival (- success) . 
Their model Is 

logJ^T^j - 20.054 - .918(blrthwelght) 

- 2.465(pH) - .386(MAS) 

+ .597(renal failure) + . 304(female) . 

The last three variables are Indicators; MAS means meconium aspiration sjmdrome 
as primary diagnosis; renal failure was defined as creatinine ^ 1.5. Since I 
did not have access to the last two variables, I used only the first four terms. 
(Dropping the last two would have no effect If all the patients were male and 
none had renal failure- -about 10 percent of the 715 cases reported by Toomaslan 
et al. had renal failure.) 

There was evidence that ECHO vas more effective than conventional therapy-- 
see Example 1 and Ware (1989), But I calculated RI(0) - P(p > p jdata) In 

Table 1 assuming that A and B were exchangeable Initially, and using a technique 
proposed by Berry (1989a) with a - 2. All previously treated patient responses, 
treatments, and prognoses are Included in "data". Such a measure can be 
calculated at r.ny time during the trial, even if it may result in early 
stopping, without compromising the eventual conclusions (Berry 1985, 1987). 

The probabilities in Table 1 are not P-values. Rather, they have a direct 
interpretation concerning the two therapies. Namely, P(p > p jdata) is the 

A Jo 

probability that therapy A is the better treatment to assign to the next 
patient. 

The probabilities in Table 1 assume tha the prior distribution of (p ,p ) 

A B 

remains unchanged during the trial. Any evidence that becomes available from 

outside the trial can be used to update the current distribution of (p ,Pn). 

A o 

The ECHO patients in phase 1 had better prognoses (on ECHO therapy) than did 
their counterparts on control: averages of 0.126 and 0.189, respectively. So 
the probabilities in the rightmost column of Table 1 are larger than they would 
be had the covariates been ignored. 

In clinical trials in which ono therapy is used exclusively for a period of 
time (phase 2 in the example), one worries that there may be a time trend in the 
patient population which then is confounded with treatment. (Indeed, this is a 
standard argument against using adaptive allocation.) The calculations shown in 
Table 1 adjust for any time trends that are manifest in the covariates used to 
calculate prognoses. Of course, it does not account for "silent" covariates. 
(The average prognosis in phase 2 is 0.173, giving an over^^Ll average of 0.158 
for ECHO patients, so there seems to be at most a slight time trend in the 
example data.) 

Table 2 gives the updated prognosis of patients on A and B using maximum 

likelihood (see Berry 1989a). The fact that p.(x) < x means that the ECHO 

patients in the current study had better results than did their counterparts in 
the national registry. (This difference cannot be the result of dropping "renal 
failure" and "female" from the logit model since their coefficients are 
positive. ) 
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Publishing the results of this trial can include an updated prognosis for 
both therapies, such as given in Table 2. If RI(c) or any other characteristic 
of the current distribution of (p.,p,.) is published, the prior distribution of 

(p^,p^) should al^o be 3>ublished. It is also incumbent on the authors to 

indicate the sensitivity of the current distribution to the prior with, perhaps, 
an indication of what the current distribution would be for different priors. 



Table 2. Updt^ced prognosis p(x) based on data from Table 1; x - initial 
prognosis. 



X 






0.95 


0.991 


0.863 


0.90 


0.980 


0.750 


0.80 


0.956 


0.571 


0.70 


0.928 


0.437 


0.60 


0.892 


0.333 


0.50 


0.846 


0.2S0 


0.40 


0.785 


0.182 



Therapies A and B are very different. ECHO is radical, invasive therapy 
whose use could Itself result in death. So it seems reasonable to assume 
P(p^ P^) "0, as I have dcue in this example. But this assumpt^>n seems less 

appropriate in most settings, in particular, in that of the next example. 

Example 3 (Dtxon et al . 1989^ 

This is a balanced randomized trial comparing two treatments for adult acute 
leukemia: A - amsacrine/cytosine arabinoside and B - mitoxantrone/cytosine 
arabinoside. The responses are reported in Table 3. Success (S) is complete 
remission and failure (F) is any response other than S. Initial prognosis in 
the probability of complete remission based on a logistic model. The 
calculation of P(p^ < p^|data) uses Beriy (1989a), as in Example 2. 

The stopping rule used oy Dixon et al. (1989) was based on a Bayesian 
calculation after pairing A and B patients on the basis of prognosis. Their 
method has the advantage of being easy to understand: 4 of 8 preferences for A 
with the other 4 pairs tied. The method of Berry (1989a) does not assume 
exchangeability of pairs, and iv does not require matching patients on 
orognosis. (Berry (1989a) gives an extension of the method to analysis of 
survival times with the possibility of censoring.) 
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Table 3. Data from Dixon et al., (1989) 



Patient 
nxunber 


Treat!:;ent 


Tnitial ^ 
prognosis 


Response 


P(n >n Idata) 


1 


B 


0.93 


s 


0.46 


2 


A 


0.78 


s 


0 r.^ 


3 


B 


0.59 


F 


0 77 


4 


A 


0.44 


e 


\i . oy 


5 


B 


0 81 




U . OH 


6 


A 


0.68 




u . oo 


7 


B 


0.87 




n 

u . ou 


8 


A 


0.87 


s 


0.87 


9 


B 


0.49 


F 


0.933 


10 


A 


0.78 


S 


0.945 


11 


B 


0.87 


F 


0.982 


12 


B 


0.74 


S 


0.971 


13 


A 


0.59 


S 


0.982 


14 


A 


0.50 


S 


0.989 


15 


B 


0.40 


F 


0.993 


16 


A 


0.93 


S 


0.994 



Predicted probability of success. 



5. Conclusion 

Scientific research is planning and learning. Learning is adaptive. The 
scientific method prescribes how learning takes place efficiently. Bayesian 
inference is consistent with the scientific method. In particular, it is an 
ideal prescription for leaminj^. Classical frequentist inference is 
inconsistent with the scientific method. 

Adaptive allocation may not have a place in medical research Trials in 
whir.h tht^re are no ethical concerns are perhaps best carried out with 
randomized, concurrent controls. But these trials should be small. This allows 
for adaptivitv. rethinking and modifying strategies between trials, which 

can save time, resources, and increase the chance of delivering effective 
medical therapy to more people. 

Wuen ethical concerns rule out RCTs, treatment should be assigned in an open 
fashion, with patients followed to ascertain effect. Correct inferences are 
difficult in open studies, at least in part because of the possibilities of bias 
in assigning treatment. Classical frequentist methods are not available; 
Bayesian inferences may be possible, but adjusting for covariates is essential. 
These inferences will be better if based on control information from historical 
data. Appropriately weighing historical data is one of the biggest challenges 
in the analysis of clinical trials. 
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